Kappa statistic to measure agreement beyond chance in free-response assessments

نویسندگان

  • Marc Carpentier
  • Christophe Combescure
  • Laura Merlini
  • Thomas V. Perneger
چکیده

BACKGROUND The usual kappa statistic requires that all observations be enumerated. However, in free-response assessments, only positive (or abnormal) findings are notified, but negative (or normal) findings are not. This situation occurs frequently in imaging or other diagnostic studies. We propose here a kappa statistic that is suitable for free-response assessments. METHOD We derived the equivalent of Cohen's kappa statistic for two raters under the assumption that the number of possible findings for any given patient is very large, as well as a formula for sampling variance that is applicable to independent observations (for clustered observations, a bootstrap procedure is proposed). The proposed statistic was applied to a real-life dataset, and compared with the common practice of collapsing observations within a finite number of regions of interest. RESULTS The free-response kappa is computed from the total numbers of discordant (b and c) and concordant positive (d) observations made in all patients, as 2d/(b + c + 2d). In 84 full-body magnetic resonance imaging procedures in children that were evaluated by 2 independent raters, the free-response kappa statistic was 0.820. Aggregation of results within regions of interest resulted in overestimation of agreement beyond chance. CONCLUSIONS The free-response kappa provides an estimate of agreement beyond chance in situations where only positive findings are reported by raters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Increasing the visibility of coding decisions in team-based qualitative research in nursing.

OBJECTIVES To examine the use of the multi-rater Kappa measure of agreement (Nonparametric Statistics for the Behavioural Sciences, McGraw Hill, New York, 1988) in team based, mixed method, qualitative nursing research. DESIGN The article presents an illustrative description of the application of the qualitative coding procedure and associated multi-rater Kappa measurement at four time points...

متن کامل

Calculating kappa measures of agreement and standard errors using SAS software: some tricks and traps

SAS/STAT® procedure FREQ is the place to start when you need to compute measures of rater or test agreement on the classic kappa scale (Cohen 1960), namely, the ratio of the actual improvement over chance to the maximum possible improvement over chance. But when you see the frustrating message "WARNING: AGREE statistics are computed only for tables where the number of rows equals the number of ...

متن کامل

How do patients with colorectal cancer perceive treatment and care compared with the treating health care professionals?

BACKGROUND Patient evaluations are widely used in quality assessment of health services. It is widely recognized that patients and professionals provide a different perspective on quality. However, the extent to which they differ and the conceptual areas in which they differ is not well understood. OBJECTIVES We sought to examine how well professional and patient assessments of hospital healt...

متن کامل

Understanding interobserver agreement: the kappa statistic.

Items such as physical exam findings, radiographic interpretations, or other diagnostic tests often rely on some degree of subjective interpretation by observers. Studies that measure the agreement between two or more observers should include a statistic that takes into account the fact that observers will sometimes agree or disagree simply by chance. The kappa statistic (or kappa coefficient) ...

متن کامل

Interrater reliability: the kappa statistic

The kappa statistic is frequently used to test interrater reliability. The importance of rater reliability lies in the fact that it represents the extent to which the data collected in the study are correct representations of the variables measured. Measurement of the extent to which data collectors (raters) assign the same score to the same variable is called interrater reliability. While ther...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2017